Grid ’ 5000 Based Large Scale OCR Using the DTW Algorithm : Case of the Arabic Cursive Writing
نویسندگان
چکیده
Large scale optical character recognition (OCR) refers to or means the computerization of large amounts of documents such as news papers. Despite the diversity of commercial OCR products, this task still remains too far from the mature especially if the input documents are insufficient quality or cursive writing such as the Arabic documents (Vinciarelli, 2002). Indeed, in their project (Holley, 2009), the national library of Australia reports that existing OCR systems are, commonly, weak. Moreover, their conducted experiments on historical newspapers show that the corresponding accuracy raw varied from 71% to 98.02%. This is surely due to the weakness of the approaches and techniques used in these systems. Printed cursive written documents such as the Arabic one presents, in addition, other difficulties which are behind the weaknesses of the existing commercialized systems especially when the quality of the input binary image of the document is not good enough. The first difficulty encountered for such writing is the segmentation of any given input word or sub-word into isolated characters given that the size of each ofwhich is variable. In practice, if the segmentation process is conducted successfully, then it eases the recognition step to a large extent. That is why Latin printedOCR systems are, commonly, more powerful compared to those devoted to the cursive writing documents. Dynamic Time Warp (DTW) algorithm is a well known procedure especially in pattern recognition (Alves et al., 2002; Khemakhem et al., 2005; Philip, 1992; Vuori et al., 2001), (Khemakhem et al., 2009; Kumar et al., 2006; Tapia et al., 2007). The DTW algorithm is the result of the adaptation of dynamic programming to the field of pattern recognition. Printed cursive writing OCR by the DTW algorithm provides very interesting recognition rates without prior character segmentation (such as: the Arabic, Persian, Urdu, latin connected characters,...), (Khemakhem et al., 2005). The purpose of the DTW algorithm is to perform optimal time alignment between a reference pattern and an unknown pattern and evaluate their difference. Intensive experiments show that the recognition rate of the DTW algorithm remains acceptable compared to the existing commercialized systems even when the quality of the input documents is not good enough. Intensive tests on more than 100.000 connected characters (most of them are Arabic cursive and including some important noise) show that the segmentation average rate is greater than 98% and the recognition average rate is 5
منابع مشابه
A P2p Grid Architecture for Distributed Arabic Ocr Based on the Dtw Algorithm
Arabic cursive optical character recognition (OCR) based on the dynamic time warping (DTW) algorithm provides simultaneously very interesting segmentation and recognition rates. However, the computing complexity of the DTW algorithm restricts its widespread utilization and its consideration at a commercial scale. Accelerating the DTW execution time has attracted many researchers and several sol...
متن کاملArabic Cursive Characters Distributed Recognition using the DTW Algorithm on BOINC
Volunteer computing or volunteer grid computing constitute a very promising infrastructure which provides enough computing and storage powers without any prior cost or investment. Indeed, such infrastructures are the result of the federation of several, geographically dispersed, computers or/and LAN computers over the Internet. Berkeley Open Infrastructure for Network Computing (BOINC) is consi...
متن کاملPerformance Evaluation of the distributed Arabic cursive characters recognition using the DTW algorithm on the SRTG
Arabic printed cursive characters Recognition using the Dynamic Time Warping (DTW) algorithm provides very interesting results. Unfortunately, the big amount of computing to be achieved by this algorithm during the recognition process makes its execution time very slow. Grid computing presents a very interesting infrastructure that allow to support distributed applications in one hand and to ta...
متن کاملDynamic Time Warping Algorithm with Distributed Systems
Distributed computing is the method of splitting a large problem into smaller pieces and allocating the workload among many computers. These individual computers process their portions of the problem, and the results are combined together to form a solution for the original problem. At present, Distributed computing systems can be broadly classified into two methods, namely Grid computing and V...
متن کاملTowards a distributed Arabic OCR based on the DTW algorithm: performance analysis
In spite of the diversity of printed Arabic optical character recognition products and proposals, the problem seems to be not yet well solved. The complex morphology and calligraphy of the Arabic writing on one hand and the use of some light approaches on the other hand are behind the poorness of these products. However, some strong proposed approaches didn’t find the opportunity to be commerci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012